Porting and Measuring the Linpack Benchmark on Gamma

نویسندگان

Giovanni Chiola

Giuseppe Ciaccio

چکیده

| GAMMA (Genoa Active Message Machine) is a high performance communication layer implemented at kernel level as an extension of the Linux operating system. It is based on Active Ports, a communication mechanism derived from Active Messages. On low-cost clusters of Personal Computers (PCs) connected by Fast Ethernet, GAMMA achieves much better communication performance compared to MPI and PVM. We have implemented and run the Linpack benchmark on a small cluster of PCs (four Pentium II 300 MHz PCs connected by shared Fast Ethernet) running GAMMA. The GAMMA version of Linpack takes into account all the features (direct broadcast) and limitations (shared LAN) of the underlying interconnect. Using a highly optimized BLAS library we were able to achieve more than 500 MFLOPS with only four processors. Such results are compared with the ones from the MPI version of the same benchmark. Network Of Workstations (NOWs) have emerged as the rst cost-eeective parallel architecture. Cluster of high-end Personal Computers (PCs) are emerging as an even better solution, with unbeated price/performance gures and potentially good absolute performance levels. A serious obstacle to running parallel message-passing software on a cluster of PCs is the high communication latency exhibited by standard environments like PVM 12] and MPI 13] running atop industry-standard low-level communication protocols like TCP and UDP. Recently several teams have been engaged in producing eecient solutions using faster networks and optimized communication software to keep latency as low as possible. Many of such attempts gave rise to non-standard programming interfaces for high-performance communication. Porting a non-trivial parallel application on a non-standard communication layer may be an expensive task. However a better price/performance ratio and a satisfactory absolute performance level on a cluster of PCs may justify the porting eeort. Moreover such porting activities shed light on the potential advantages of using performance-oriented rather than general-purpose implementations of MPI and PVM on clusters of PCs. In this paper we discuss the experience of parallelizing and measuring performance of the Linpack benchmark on a small and low-cost yet very eecient cluster of PCs. The cluster is composed of four Pentium II 300 MHz PCs net-worked by a shared 100base-TX Ethernet LAN. Each PC runs the Linux operating system extended with an eecient custom messaging system called the Genoa Active Message MAchine (GAMMA). II The Genoa Active Message MAchine GAMMA 7], 10] is an eecient messaging system based on Active Ports 8], a communication mechanism …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ISSN 1342-2812 Methodology for Coping with Heterogeneity of Modern Accelerators on a Massive Supercomputing Scale

Heterogeneous supercomputers with combined general-purpose and accelerated CPUs promise to be the future major architecture due to their wide-ranging generality and superior performance / power ratio. However, developing applications that achieve effective scalability is still very difficult, and in fact unproven on large-scale machines in such combined setting. We show that an effective method...

متن کامل

The LINPACK Benchmark: past, present and future

This paper describes the LINPACK Benchmark and some of its variations commonly used to assess the performance of computer systems. Aside from the LINPACK Benchmark suite, the TOP500 and the HPL codes are presented. The latter is frequently used to obtained results for TOP500 submissions. Information is also given on how to interpret the results of the benchmark and how the results fit into the ...

متن کامل

The Problem With the Linpack Benchmark 1.0 Matrix Generator

We characterize the matrix sizes for which the Linpack Benchmark 1.0 matrix generator constructs a matrix with identical columns.

متن کامل

Performance Analysis of Cloud Infrastructure Using Linpack Benchmark

A scientific computing application and well known synthetic benchmark, the Linpack numerical library has been used to evaluate the performance of cloud infrastructure. Using this application the performance of several key factors in cloud computing such as the influence of the number of virtual machines employed per host, as well as the I/O operations were evaluate . Linpack benchmark was execu...

متن کامل